93 research outputs found
Mind2Web: Towards a Generalist Agent for the Web
We introduce Mind2Web, the first dataset for developing and evaluating
generalist agents for the web that can follow language instructions to complete
complex tasks on any website. Existing datasets for web agents either use
simulated websites or only cover a limited set of websites and tasks, thus not
suitable for generalist web agents. With over 2,000 open-ended tasks collected
from 137 websites spanning 31 domains and crowdsourced action sequences for the
tasks, Mind2Web provides three necessary ingredients for building generalist
web agents: 1) diverse domains, websites, and tasks, 2) use of real-world
websites instead of simulated and simplified ones, and 3) a broad spectrum of
user interaction patterns. Based on Mind2Web, we conduct an initial exploration
of using large language models (LLMs) for building generalist web agents. While
the raw HTML of real-world websites are often too large to be fed to LLMs, we
show that first filtering it with a small LM significantly improves the
effectiveness and efficiency of LLMs. Our solution demonstrates a decent level
of performance, even on websites or entire domains the model has never seen
before, but there is still a substantial room to improve towards truly
generalizable agents. We open-source our dataset, model implementation, and
trained models (https://osu-nlp-group.github.io/Mind2Web) to facilitate further
research on building a generalist agent for the web.Comment: website: https://osu-nlp-group.github.io/Mind2We
- …